R Markdown

[Q.N.1] Attach the AdultUCI data in R

library(arulesViz)
## Warning: package 'arulesViz' was built under R version 4.1.2
## Warning: package 'arules' was built under R version 4.1.2
library(arules)

Here I loaded the necessary library arulesViz and arules.

data("AdultUCI") 

Loaded the builtit data AdultUCI.

[Q.N.2] Check the class, structure, dimension, head and tail of the attached data and write interpretations

class(AdultUCI)
## [1] "data.frame"

Class of “AdultUCI” is data frame.

str(AdultUCI)
## 'data.frame':    48842 obs. of  15 variables:
##  $ age           : int  39 50 38 53 28 37 49 52 31 42 ...
##  $ workclass     : Factor w/ 8 levels "Federal-gov",..: 7 6 4 4 4 4 4 6 4 4 ...
##  $ fnlwgt        : int  77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
##  $ education     : Ord.factor w/ 16 levels "Preschool"<"1st-4th"<..: 14 14 9 7 14 15 5 9 15 14 ...
##  $ education-num : int  13 13 9 7 13 14 5 9 14 13 ...
##  $ marital-status: Factor w/ 7 levels "Divorced","Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
##  $ occupation    : Factor w/ 14 levels "Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
##  $ relationship  : Factor w/ 6 levels "Husband","Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
##  $ race          : Factor w/ 5 levels "Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
##  $ sex           : Factor w/ 2 levels "Female","Male": 2 2 2 2 1 1 1 2 1 2 ...
##  $ capital-gain  : int  2174 0 0 0 0 0 0 0 14084 5178 ...
##  $ capital-loss  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ hours-per-week: int  40 13 40 40 40 40 16 45 50 40 ...
##  $ native-country: Factor w/ 41 levels "Cambodia","Canada",..: 39 39 39 39 5 39 23 39 39 39 ...
##  $ income        : Ord.factor w/ 2 levels "small"<"large": 1 1 1 1 1 1 1 2 2 2 ...

From above results we can see that most of variables of AdultUCI are num some of them are factor and rest of them are ord.factor type.

dim(AdultUCI)
## [1] 48842    15

In AdultUCI there are 48842 numbers of row and 15 variables.

head(AdultUCI)
##   age        workclass fnlwgt education education-num     marital-status
## 1  39        State-gov  77516 Bachelors            13      Never-married
## 2  50 Self-emp-not-inc  83311 Bachelors            13 Married-civ-spouse
## 3  38          Private 215646   HS-grad             9           Divorced
## 4  53          Private 234721      11th             7 Married-civ-spouse
## 5  28          Private 338409 Bachelors            13 Married-civ-spouse
## 6  37          Private 284582   Masters            14 Married-civ-spouse
##          occupation  relationship  race    sex capital-gain capital-loss
## 1      Adm-clerical Not-in-family White   Male         2174            0
## 2   Exec-managerial       Husband White   Male            0            0
## 3 Handlers-cleaners Not-in-family White   Male            0            0
## 4 Handlers-cleaners       Husband Black   Male            0            0
## 5    Prof-specialty          Wife Black Female            0            0
## 6   Exec-managerial          Wife White Female            0            0
##   hours-per-week native-country income
## 1             40  United-States  small
## 2             13  United-States  small
## 3             40  United-States  small
## 4             40  United-States  small
## 5             40           Cuba  small
## 6             40  United-States  small

Head(AdultUCI) shows the top 6 row with all variables.

tail(AdultUCI)
##       age    workclass fnlwgt education education-num     marital-status
## 48837  33      Private 245211 Bachelors            13      Never-married
## 48838  39      Private 215419 Bachelors            13           Divorced
## 48839  64         <NA> 321403   HS-grad             9            Widowed
## 48840  38      Private 374983 Bachelors            13 Married-civ-spouse
## 48841  44      Private  83891 Bachelors            13           Divorced
## 48842  35 Self-emp-inc 182148 Bachelors            13 Married-civ-spouse
##            occupation   relationship               race    sex capital-gain
## 48837  Prof-specialty      Own-child              White   Male            0
## 48838  Prof-specialty  Not-in-family              White Female            0
## 48839            <NA> Other-relative              Black   Male            0
## 48840  Prof-specialty        Husband              White   Male            0
## 48841    Adm-clerical      Own-child Asian-Pac-Islander   Male         5455
## 48842 Exec-managerial        Husband              White   Male            0
##       capital-loss hours-per-week native-country income
## 48837            0             40  United-States   <NA>
## 48838            0             36  United-States   <NA>
## 48839            0             40  United-States   <NA>
## 48840            0             50  United-States   <NA>
## 48841            0             40  United-States   <NA>
## 48842            0             60  United-States   <NA>

tail(AdultUCI) gives the bottom 6 rows with all variables.

[Q.N.3]Remove “fnlwgt” and “education-num” variables from the attached data and explain the logic you have used here

AdultUCI[['fnlwgt']] <- NULL
AdultUCI[["education-num"]] <-NULL

If we want to delete the particular column we set the valeas NULL to that column. Then that column will be removed. In above two columns fnlwgt and education-num deleted.

str(AdultUCI$age)
##  int [1:48842] 39 50 38 53 28 37 49 52 31 42 ...

[Q.N.4] Convert “age” as ordered factor variables with cuts at 15, 25, 45, 65 and 100 and label it as “Young”, “Middle-aged”, “Senior” and “Old”

AdultUCI[[ "age"]] <- ordered(cut(AdultUCI[[ "age"]], c(15,25,45,65,100)),
labels = c("Young", "Middle-aged", "Senior", "Old"))

Initially our age variable is in int data type.Here we changed the age variable in four order factors like Young, Middle-age, Senior,Old.

[Q.N.5] Convert the “hours-per-week” as ordered factor variable with cuts at 0, 25, 40, 60, 168 and label it as “Part-time”, “Full-time”, “Over-time” and “Workaholic”

AdultUCI[[ "hours-per-week"]] <- ordered(cut(AdultUCI[[ "hours-per-week"]],
c(0,25,40,60,168)),
labels = c("Part-time", "Full-time", "Over-time", "Workaholic"))

Initially our hours-per-week variable is in int data type.Here we changed the hours-per-week variable in four order factors like Part-time, Full-time, Over-time,Workaholic.

[Q.N.6] Convert the “capital-loss” as ordered factor variable with cuts at –Inf, 0, median and Inf and label it as “None”, “Low” and “High”

AdultUCI[[ "capital-gain"]] <- ordered(cut(AdultUCI[[ "capital-gain"]],
c(-Inf,0,median(AdultUCI[[ "capital-gain"]][AdultUCI[[ "capital-gain"]]>0]),
Inf)), labels = c("None", "Low", "High"))

Initially,capital-gain variable is in int data type. Here we changed the hours-per-week variable in four order factors like None, Low, High.

[Q.N.7] Convert the “capital-gain” as ordered factor variable with cuts at –Inf, 0, median and Inf and label it as “None”, “Low” and “High”

AdultUCI[[ "capital-loss"]] <- ordered(cut(AdultUCI[[ "capital-loss"]],
c(-Inf,0, median(AdultUCI[[ "capital-loss"]][AdultUCI[[ "capital-loss"]]>0]),
Inf)), labels = c("None", "Low", "High"))

Initially capital-loss variable is in int data type.Here we changed the capital-loss variable in three order factors like None, Low, High.

[Q.N.8] Create transactions of AdultUCI data as “Adult” and check it with “Adult” command

Adult <- as(AdultUCI, "transactions")
Adult
## transactions in sparse format with
##  48842 transactions (rows) and
##  115 items (columns)

Here we create the transition of AdultUCI data. Where transactions in sparse format with 48842 transactions (rows) and 115 items.

[Q.N.9] Inspect head and tail of the “Adult” and interpret them carefully

summary(Adult)
## transactions as itemMatrix in sparse format with
##  48842 rows (elements/itemsets/transactions) and
##  115 columns (items) and a density of 0.1089939 
## 
## most frequent items:
##            capital-loss=None            capital-gain=None 
##                        46560                        44807 
## native-country=United-States                   race=White 
##                        43832                        41762 
##            workclass=Private                      (Other) 
##                        33906                       401333 
## 
## element (itemset/transaction) length distribution:
## sizes
##     9    10    11    12    13 
##    19   971  2067 15623 30162 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.00   12.00   13.00   12.53   13.00   13.00 
## 
## includes extended item information - examples:
##            labels variables      levels
## 1       age=Young       age       Young
## 2 age=Middle-aged       age Middle-aged
## 3      age=Senior       age      Senior
## 
## includes extended transaction information - examples:
##   transactionID
## 1             1
## 2             2
## 3             3

Three typesof transation T1, T2, T3 created in AdultUCI data.

[Q.N.10]Inspect head and tail of the “Adult” and interpret them carefully

df1 <- head(Adult)
inspect(df1)
##     items                                transactionID
## [1] {age=Middle-aged,                                 
##      workclass=State-gov,                             
##      education=Bachelors,                             
##      marital-status=Never-married,                    
##      occupation=Adm-clerical,                         
##      relationship=Not-in-family,                      
##      race=White,                                      
##      sex=Male,                                        
##      capital-gain=Low,                                
##      capital-loss=None,                               
##      hours-per-week=Full-time,                        
##      native-country=United-States,                    
##      income=small}                                   1
## [2] {age=Senior,                                      
##      workclass=Self-emp-not-inc,                      
##      education=Bachelors,                             
##      marital-status=Married-civ-spouse,               
##      occupation=Exec-managerial,                      
##      relationship=Husband,                            
##      race=White,                                      
##      sex=Male,                                        
##      capital-gain=None,                               
##      capital-loss=None,                               
##      hours-per-week=Part-time,                        
##      native-country=United-States,                    
##      income=small}                                   2
## [3] {age=Middle-aged,                                 
##      workclass=Private,                               
##      education=HS-grad,                               
##      marital-status=Divorced,                         
##      occupation=Handlers-cleaners,                    
##      relationship=Not-in-family,                      
##      race=White,                                      
##      sex=Male,                                        
##      capital-gain=None,                               
##      capital-loss=None,                               
##      hours-per-week=Full-time,                        
##      native-country=United-States,                    
##      income=small}                                   3
## [4] {age=Senior,                                      
##      workclass=Private,                               
##      education=11th,                                  
##      marital-status=Married-civ-spouse,               
##      occupation=Handlers-cleaners,                    
##      relationship=Husband,                            
##      race=Black,                                      
##      sex=Male,                                        
##      capital-gain=None,                               
##      capital-loss=None,                               
##      hours-per-week=Full-time,                        
##      native-country=United-States,                    
##      income=small}                                   4
## [5] {age=Middle-aged,                                 
##      workclass=Private,                               
##      education=Bachelors,                             
##      marital-status=Married-civ-spouse,               
##      occupation=Prof-specialty,                       
##      relationship=Wife,                               
##      race=Black,                                      
##      sex=Female,                                      
##      capital-gain=None,                               
##      capital-loss=None,                               
##      hours-per-week=Full-time,                        
##      native-country=Cuba,                             
##      income=small}                                   5
## [6] {age=Middle-aged,                                 
##      workclass=Private,                               
##      education=Masters,                               
##      marital-status=Married-civ-spouse,               
##      occupation=Exec-managerial,                      
##      relationship=Wife,                               
##      race=White,                                      
##      sex=Female,                                      
##      capital-gain=None,                               
##      capital-loss=None,                               
##      hours-per-week=Full-time,                        
##      native-country=United-States,                    
##      income=small}                                   6
df2<-tail(Adult)
inspect(df2)
##     items                                transactionID
## [1] {age=Middle-aged,                                 
##      workclass=Private,                               
##      education=Bachelors,                             
##      marital-status=Never-married,                    
##      occupation=Prof-specialty,                       
##      relationship=Own-child,                          
##      race=White,                                      
##      sex=Male,                                        
##      capital-gain=None,                               
##      capital-loss=None,                               
##      hours-per-week=Full-time,                        
##      native-country=United-States}               48837
## [2] {age=Middle-aged,                                 
##      workclass=Private,                               
##      education=Bachelors,                             
##      marital-status=Divorced,                         
##      occupation=Prof-specialty,                       
##      relationship=Not-in-family,                      
##      race=White,                                      
##      sex=Female,                                      
##      capital-gain=None,                               
##      capital-loss=None,                               
##      hours-per-week=Full-time,                        
##      native-country=United-States}               48838
## [3] {age=Senior,                                      
##      education=HS-grad,                               
##      marital-status=Widowed,                          
##      relationship=Other-relative,                     
##      race=Black,                                      
##      sex=Male,                                        
##      capital-gain=None,                               
##      capital-loss=None,                               
##      hours-per-week=Full-time,                        
##      native-country=United-States}               48839
## [4] {age=Middle-aged,                                 
##      workclass=Private,                               
##      education=Bachelors,                             
##      marital-status=Married-civ-spouse,               
##      occupation=Prof-specialty,                       
##      relationship=Husband,                            
##      race=White,                                      
##      sex=Male,                                        
##      capital-gain=None,                               
##      capital-loss=None,                               
##      hours-per-week=Over-time,                        
##      native-country=United-States}               48840
## [5] {age=Middle-aged,                                 
##      workclass=Private,                               
##      education=Bachelors,                             
##      marital-status=Divorced,                         
##      occupation=Adm-clerical,                         
##      relationship=Own-child,                          
##      race=Asian-Pac-Islander,                         
##      sex=Male,                                        
##      capital-gain=Low,                                
##      capital-loss=None,                               
##      hours-per-week=Full-time,                        
##      native-country=United-States}               48841
## [6] {age=Middle-aged,                                 
##      workclass=Self-emp-inc,                          
##      education=Bachelors,                             
##      marital-status=Married-civ-spouse,               
##      occupation=Exec-managerial,                      
##      relationship=Husband,                            
##      race=White,                                      
##      sex=Male,                                        
##      capital-gain=None,                               
##      capital-loss=None,                               
##      hours-per-week=Over-time,                        
##      native-country=United-States}               48842

[Q.N.11] Create absolute and relative item frequency plot and color it with RColorBrewer package

library(RColorBrewer)
coul <- brewer.pal(5, "BuPu") 
itemFrequencyPlot(df1, topN=10,  cex.names=1, col = coul)

Absolute frequency indicates the number of occurrences of a data value or the number of times a data value occurs.Relative frequency is the absolute frequency of that event divided by the total number of event.

{Q.N.12} Create an apriori rule as “association.rule” with support = 1%, confidence = 80% and maximum length of the rule as 10. Get summary of this rule and interpret it carefully.

rules <- apriori(df2, 
parameter = list(supp=0.1, conf=0.8, 
maxlen=10, 
target= "rules"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 0 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[28 item(s), 6 transaction(s)] done [0.00s].
## sorting and recoding items ... [28 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10
## Warning in apriori(df2, parameter = list(supp = 0.1, conf = 0.8, maxlen = 10, :
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
##  done [0.01s].
## writing ... [103788 rule(s)] done [0.02s].
## creating S4 object  ... done [0.05s].

In data AdultUCI Apriori algorithms gives 103788 rules. # [Q.N.13] Inspect the first 10 rules and interpret it critically.

inspect(head(sort(rules, by="support"),10))
##      lhs                               rhs                            support  
## [1]  {}                             => {capital-loss=None}            1.0000000
## [2]  {}                             => {native-country=United-States} 1.0000000
## [3]  {capital-loss=None}            => {native-country=United-States} 1.0000000
## [4]  {native-country=United-States} => {capital-loss=None}            1.0000000
## [5]  {}                             => {sex=Male}                     0.8333333
## [6]  {}                             => {capital-gain=None}            0.8333333
## [7]  {}                             => {education=Bachelors}          0.8333333
## [8]  {}                             => {age=Middle-aged}              0.8333333
## [9]  {sex=Male}                     => {capital-loss=None}            0.8333333
## [10] {capital-loss=None}            => {sex=Male}                     0.8333333
##      confidence coverage  lift count
## [1]  1.0000000  1.0000000 1    6    
## [2]  1.0000000  1.0000000 1    6    
## [3]  1.0000000  1.0000000 1    6    
## [4]  1.0000000  1.0000000 1    6    
## [5]  0.8333333  1.0000000 1    5    
## [6]  0.8333333  1.0000000 1    5    
## [7]  0.8333333  1.0000000 1    5    
## [8]  0.8333333  1.0000000 1    5    
## [9]  1.0000000  0.8333333 1    5    
## [10] 0.8333333  1.0000000 1    5

From above rules we see that capital-loss in factor order “None” are 6, native country united states are 6.

[Q.N.14] Remove the empty rules from the “association.rule” and inspect the first 10 rules with interpretations.

rules <- apriori(df2, 
parameter = list(supp=0.1, 
conf=0.8, 
maxlen=10, 
minlen=2,
target= "rules"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5     0.1      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 0 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[28 item(s), 6 transaction(s)] done [0.00s].
## sorting and recoding items ... [28 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10
## Warning in apriori(df2, parameter = list(supp = 0.1, conf = 0.8, maxlen = 10, :
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
##  done [0.00s].
## writing ... [103782 rule(s)] done [0.02s].
## creating S4 object  ... done [0.05s].

Here we removed the rules which are empty.

[Q.N.15] Create a new rule as “capital.gain.rhs.rule” with “capital-gain=None” in the RHS with support of 1%, confidence of 80%, maximum length of 10 and minimum length of 2.

capital_gain_rhs_rule<-apriori(df2, 
parameter = list(supp=0.1, conf=0.8, 
maxlen=10, 
minlen=2),
appearance = list(default="lhs", 
rhs="capital-gain=None"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5     0.1      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 0 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[28 item(s), 6 transaction(s)] done [0.00s].
## sorting and recoding items ... [28 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10
## Warning in apriori(df2, parameter = list(supp = 0.1, conf = 0.8, maxlen = 10, :
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
##  done [0.00s].
## writing ... [7315 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Here I create the new rules in which we set any values in lhs but having capital-gain=None in rhs.

#[Q.N.16] Get summary of this rule and interpret it critically.

summary(capital_gain_rhs_rule)
## set of 7315 rules
## 
## rule length distribution (lhs + rhs):sizes
##    2    3    4    5    6    7    8    9   10 
##   20  132  478 1090 1665 1758 1295  657  220 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   6.000   7.000   6.639   8.000  10.000 
## 
## summary of quality measures:
##     support         confidence        coverage           lift      
##  Min.   :0.1667   Min.   :0.8000   Min.   :0.1667   Min.   :0.960  
##  1st Qu.:0.1667   1st Qu.:1.0000   1st Qu.:0.1667   1st Qu.:1.200  
##  Median :0.1667   Median :1.0000   Median :0.1667   Median :1.200  
##  Mean   :0.1877   Mean   :0.9995   Mean   :0.1882   Mean   :1.199  
##  3rd Qu.:0.1667   3rd Qu.:1.0000   3rd Qu.:0.1667   3rd Qu.:1.200  
##  Max.   :0.8333   Max.   :1.0000   Max.   :1.0000   Max.   :1.200  
##      count      
##  Min.   :1.000  
##  1st Qu.:1.000  
##  Median :1.000  
##  Mean   :1.126  
##  3rd Qu.:1.000  
##  Max.   :5.000  
## 
## mining info:
##  data ntransactions support confidence
##   df2             6     0.1        0.8
##                                                                                                                                                   call
##  apriori(data = df2, parameter = list(supp = 0.1, conf = 0.8, maxlen = 10, minlen = 2), appearance = list(default = "lhs", rhs = "capital-gain=None"))

set of 1715 rules are formed.

[Q.N.17] Create a new rule as “hours.per.week.ft.rule” with “hour-per-week=Full-time” in the RHS with support of 1%, confidence of 80%, maximum length of 10 and minimum length of 2.

hours.per.week.ft.rule<-apriori(df2, 
parameter = list(supp=0.1, conf=0.8, 
maxlen=10, 
minlen=2),
appearance = list(default="lhs", 
rhs="hours-per-week=Full-time"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5     0.1      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 0 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[28 item(s), 6 transaction(s)] done [0.00s].
## sorting and recoding items ... [28 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10
## Warning in apriori(df2, parameter = list(supp = 0.1, conf = 0.8, maxlen = 10, :
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
##  done [0.01s].
## writing ... [5676 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Here new rules “hours.per.week.ft.rule” crested with any values in lhs and in rhs hours-per-week=Full-time data kept.

[Q.N.18] Get summary of this rule and interpret it critically.

summary(hours.per.week.ft.rule)
## set of 5676 rules
## 
## rule length distribution (lhs + rhs):sizes
##    2    3    4    5    6    7    8    9   10 
##   13  105  390  874 1304 1350  981  494  165 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   6.000   7.000   6.605   8.000  10.000 
## 
## summary of quality measures:
##     support         confidence    coverage           lift         count      
##  Min.   :0.1667   Min.   :1    Min.   :0.1667   Min.   :1.5   Min.   :1.000  
##  1st Qu.:0.1667   1st Qu.:1    1st Qu.:0.1667   1st Qu.:1.5   1st Qu.:1.000  
##  Median :0.1667   Median :1    Median :0.1667   Median :1.5   Median :1.000  
##  Mean   :0.1695   Mean   :1    Mean   :0.1695   Mean   :1.5   Mean   :1.017  
##  3rd Qu.:0.1667   3rd Qu.:1    3rd Qu.:0.1667   3rd Qu.:1.5   3rd Qu.:1.000  
##  Max.   :0.3333   Max.   :1    Max.   :0.3333   Max.   :1.5   Max.   :2.000  
## 
## mining info:
##  data ntransactions support confidence
##   df2             6     0.1        0.8
##                                                                                                                                                          call
##  apriori(data = df2, parameter = list(supp = 0.1, conf = 0.8, maxlen = 10, minlen = 2), appearance = list(default = "lhs", rhs = "hours-per-week=Full-time"))

5676 rules formed.

#[Q.N.19] Get new rule of “hours.per.week.ft.rule” as “conf.sort.rule” by sorting “hours.per.week.ft.rule” in descending order by “confidence” and inspect the head and tail rules with critical interpretation.

conf.sort.rule <- sort(hours.per.week.ft.rule,
                       by = "confidence",
                       decreasing = TRUE)
inspect(head(conf.sort.rule))
##     lhs                              rhs                          support confidence  coverage lift count
## [1] {race=Black}                  => {hours-per-week=Full-time} 0.1666667          1 0.1666667  1.5     1
## [2] {relationship=Other-relative} => {hours-per-week=Full-time} 0.1666667          1 0.1666667  1.5     1
## [3] {marital-status=Widowed}      => {hours-per-week=Full-time} 0.1666667          1 0.1666667  1.5     1
## [4] {education=HS-grad}           => {hours-per-week=Full-time} 0.1666667          1 0.1666667  1.5     1
## [5] {age=Senior}                  => {hours-per-week=Full-time} 0.1666667          1 0.1666667  1.5     1
## [6] {capital-gain=Low}            => {hours-per-week=Full-time} 0.1666667          1 0.1666667  1.5     1
inspect(tail(conf.sort.rule))
##     lhs                               rhs                          support confidence  coverage lift count
## [1] {age=Middle-aged,                                                                                     
##      workclass=Private,                                                                                   
##      occupation=Prof-specialty,                                                                           
##      relationship=Own-child,                                                                              
##      race=White,                                                                                          
##      sex=Male,                                                                                            
##      capital-gain=None,                                                                                   
##      capital-loss=None,                                                                                   
##      native-country=United-States} => {hours-per-week=Full-time} 0.1666667          1 0.1666667  1.5     1
## [2] {age=Middle-aged,                                                                                     
##      workclass=Private,                                                                                   
##      education=Bachelors,                                                                                 
##      occupation=Prof-specialty,                                                                           
##      relationship=Own-child,                                                                              
##      race=White,                                                                                          
##      sex=Male,                                                                                            
##      capital-loss=None,                                                                                   
##      native-country=United-States} => {hours-per-week=Full-time} 0.1666667          1 0.1666667  1.5     1
## [3] {age=Middle-aged,                                                                                     
##      workclass=Private,                                                                                   
##      education=Bachelors,                                                                                 
##      occupation=Prof-specialty,                                                                           
##      relationship=Own-child,                                                                              
##      race=White,                                                                                          
##      capital-gain=None,                                                                                   
##      capital-loss=None,                                                                                   
##      native-country=United-States} => {hours-per-week=Full-time} 0.1666667          1 0.1666667  1.5     1
## [4] {age=Middle-aged,                                                                                     
##      education=Bachelors,                                                                                 
##      occupation=Prof-specialty,                                                                           
##      relationship=Own-child,                                                                              
##      race=White,                                                                                          
##      sex=Male,                                                                                            
##      capital-gain=None,                                                                                   
##      capital-loss=None,                                                                                   
##      native-country=United-States} => {hours-per-week=Full-time} 0.1666667          1 0.1666667  1.5     1
## [5] {age=Middle-aged,                                                                                     
##      workclass=Private,                                                                                   
##      education=Bachelors,                                                                                 
##      occupation=Prof-specialty,                                                                           
##      relationship=Own-child,                                                                              
##      sex=Male,                                                                                            
##      capital-gain=None,                                                                                   
##      capital-loss=None,                                                                                   
##      native-country=United-States} => {hours-per-week=Full-time} 0.1666667          1 0.1666667  1.5     1
## [6] {age=Middle-aged,                                                                                     
##      workclass=Private,                                                                                   
##      education=Bachelors,                                                                                 
##      relationship=Own-child,                                                                              
##      race=White,                                                                                          
##      sex=Male,                                                                                            
##      capital-gain=None,                                                                                   
##      capital-loss=None,                                                                                   
##      native-country=United-States} => {hours-per-week=Full-time} 0.1666667          1 0.1666667  1.5     1
summary(head(conf.sort.rule))
## set of 6 rules
## 
## rule length distribution (lhs + rhs):sizes
## 2 
## 6 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       2       2       2       2       2       2 
## 
## summary of quality measures:
##     support         confidence    coverage           lift         count  
##  Min.   :0.1667   Min.   :1    Min.   :0.1667   Min.   :1.5   Min.   :1  
##  1st Qu.:0.1667   1st Qu.:1    1st Qu.:0.1667   1st Qu.:1.5   1st Qu.:1  
##  Median :0.1667   Median :1    Median :0.1667   Median :1.5   Median :1  
##  Mean   :0.1667   Mean   :1    Mean   :0.1667   Mean   :1.5   Mean   :1  
##  3rd Qu.:0.1667   3rd Qu.:1    3rd Qu.:0.1667   3rd Qu.:1.5   3rd Qu.:1  
##  Max.   :0.1667   Max.   :1    Max.   :0.1667   Max.   :1.5   Max.   :1  
## 
## mining info:
##  data ntransactions support confidence
##   df2             6     0.1        0.8
##                                                                                                                                                          call
##  apriori(data = df2, parameter = list(supp = 0.1, conf = 0.8, maxlen = 10, minlen = 2), appearance = list(default = "lhs", rhs = "hours-per-week=Full-time"))
summary(tail(conf.sort.rule))
## set of 6 rules
## 
## rule length distribution (lhs + rhs):sizes
## 10 
##  6 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      10      10      10      10      10      10 
## 
## summary of quality measures:
##     support         confidence    coverage           lift         count  
##  Min.   :0.1667   Min.   :1    Min.   :0.1667   Min.   :1.5   Min.   :1  
##  1st Qu.:0.1667   1st Qu.:1    1st Qu.:0.1667   1st Qu.:1.5   1st Qu.:1  
##  Median :0.1667   Median :1    Median :0.1667   Median :1.5   Median :1  
##  Mean   :0.1667   Mean   :1    Mean   :0.1667   Mean   :1.5   Mean   :1  
##  3rd Qu.:0.1667   3rd Qu.:1    3rd Qu.:0.1667   3rd Qu.:1.5   3rd Qu.:1  
##  Max.   :0.1667   Max.   :1    Max.   :0.1667   Max.   :1.5   Max.   :1  
## 
## mining info:
##  data ntransactions support confidence
##   df2             6     0.1        0.8
##                                                                                                                                                          call
##  apriori(data = df2, parameter = list(supp = 0.1, conf = 0.8, maxlen = 10, minlen = 2), appearance = list(default = "lhs", rhs = "hours-per-week=Full-time"))

[Q.N.20] Plot the “hours.per.week.ft.rule” with arulesViz package with plot, plot with “two-key plot”, engine=”plotly”, method=graph & engine=htmlwidget and paraller coordinate plot and interpret each graph carefully

library(arulesViz)
plot(hours.per.week.ft.rule)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

plot(hours.per.week.ft.rule, method = "two-key plot")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

plot(hours.per.week.ft.rule, engine  = "plotly")
## Warning: Too many rules supplied. Only plotting the best 1000 using
## 'lift' (change control parameter max if needed).
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
subrules <- head(hours.per.week.ft.rule, n = 10, by 
= "confidence")
plot(subrules, method = "graph", 
engine = "htmlwidget")
subrules <- head(hours.per.week.ft.rule, n = 10, by 
= "confidence")
 plot(subrules, method = "paracoord")